Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Single precision floating general matrix multiply optimization for machine translation based on ARMv8 architecture

GONG Mingqing, YE Huang, ZHANG Jian, LU Xingjing, CHEN Wei

Journal of Computer Applications 2019, 39 (6): 1557-1562. DOI: 10.11772/j.issn.1001-9081.2018122608

Abstract （701）

PDF （1002KB）（559）

Save

Aiming at the inefficiency of neural network inferential calculation executed by mobile intelligent devices using ARM processor, a set of Single precision floating GEneral Matrix Multiply (SGEMM) algorithm optimization scheme based on ARMv8 architecture was proposed. Firstly, it was determined that the computational efficiency of the processor based on ARMv8 architecture executing SGEMM algorithm was limited by the vectorized computation unit usage scheme, the instruction pipeline, and the probability of occurrence of cache miss. Secondly, three optimization techniques:vector instruction inline assembly, data rearrangement and data prefetching were implemented for the three reasons that the computational efficiency was limited. Finally, the test experiments were designed based on three matrix patterns commonly used in the neural network of speech direction and the programs were run on the RK3399 hardware platform. The experimental results show that, the single-core computing speed is 10.23 GFLOPS in square matrix mode, reaching 78.2% of the measured floating-point peak value; the single-core computing speed is 6.35 GFLOPS in slender matrix mode, reaching 48.1% of the measured floating-point peak value; and the single-core computing speed is 2.53 GFLOPS in continuous small matrix mode, reaching 19.2% of the measured floating-point peak value. With the optimized SGEMM algorithm deployed into the speech recognition neural network program, the actual speech recognition speed of program is significantly improved.

Reference | Related Articles | Metrics